Matrix factorization is a common machine learning technique for recommendersystems. Despite its high prediction accuracy, the Bayesian ProbabilisticMatrix Factorization algorithm (BPMF) has not been widely used on large scaledata because of its high computational cost. In this paper we propose adistributed high-performance parallel implementation of BPMF on shared memoryand distributed architectures. We show by using efficient load balancing usingwork stealing on a single node, and by using asynchronous communication in thedistributed version we beat state of the art implementations.
展开▼